{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Python variables - behind the scenes" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We will now examine how Python stores objects in memory, and the link between variables and memory location. You might be wondering why you need to worry about this, but it is actually essential to understand this in order to make best use of Python's capabilities and avoid mistakes/bugs." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Assignment and modification" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Consider the following two examples. First:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 2\n", "b = a\n", "print(a, b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 4\n", "print(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This should hopefully make sense so far." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now consider the following example: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = [2, 3, 4]\n", "b = a\n", "a.append(5)\n", "print(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, modifying ``a`` modified ``b`` too! This is not as intutitive... But if we do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 9\n", "print(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This time, changing ``a`` did not change ``b`` - what is happening?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The key is to understand that doing:\n", " \n", " variable = something\n", " \n", "will change which object ``variable`` is pointing to in memory (**assignment**). On the other hand, when calling a method with:\n", "\n", " variable.method()\n", "\n", "some (but not all) methods will modify the variable **in-place** (more information below)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let's go over the examples above but this time we will discuss explicitly the **variables**, and the **objects in memory**. If we do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 2\n", "b = a\n", "a = 4\n", "print(a,b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "then what is happening is the following." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "First, when doing ``a = 2`` we create space in memory for the value ``2`` and we assign that location in memory to the variable ``a``." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "By doing ``b = a``, we are now assigning the variable ``b`` to point at the same object as ``a``." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "And finally by doing ``a = 4`` we re-assign ``a`` to point at a different place in memory (containing ``4``) but ``b`` still points at the same object (``2``)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now if we follow the same logic for the second example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = [2, 3, 4]\n", "b = a\n", "a.append(5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "we again start off by creating space in memory for the list ``[2, 3, 4]``, then we point the variable ``a`` to that location." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "By doing ``b = a``, we then point ``b`` to the same location as ``a``, so **the list exists only once in memory** (this is very important)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We now **modify, in-place,** the object that ``a`` is pointing to with ``a.append(5)`` - the concept of modifying the object is very important - we are not creating a new list, it is still in the same place in memory, even if it has one extra element now.\n", "\n", "This means that since ``b`` is pointing to the same place in memory, it will also see a list with (now) four elements!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Then, if one does ``a = 9``, then one is not modifying the list, but instead assigning ``a`` to point to a region in memory with the value ``9``." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to talk about this behavior, we use the terms **copying** and **referencing**. When we do:\n", "\n", " variable = something\n", "\n", "then the **value** is actually created when writing ``something``. The assignment merely creates a pointer (“reference” is just a fancy name for that) from a name to that value, and you could have more such names pointing to the same something." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Another important point is that what is on the right hand side will get evaluated first, and will (conceptually) result in the creation of a new object unless the ``something`` is a reference already (in which case ``variable`` and ``something`` will just refer to the same value. In the following cases, ``something`` is a “literal” (i.e., the representation of a value in the source code), and a new value will be created:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 2\n", "b = a + 1\n", "c = b * 2\n", "print(a, b, c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the second assignment in the following, ``something`` is a reference, and hence no new object is being created:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = [2,3,4]\n", "b = a # b points to the same object as a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In case you're uncertain at some point, there's python's built-in ``id`` function that tells you the identity (in memory) of its argument:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "id(a), id(b), id(c)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Copying" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In some cases, the behavior described above is not desirable, and we want to make a true copy, not just a reference, *because we want to change* ``b`` *without changing* ``a``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from copy import deepcopy\n", "a = [2,3,4]\n", "b = deepcopy(a)\n", "a.append(5)\n", "print(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ``copy`` module contains a function ``copy``, too. If you want to really understand what's going on, it will probably help to create a nested list (as in ``[range(2), range(3)]``), copy that and manipulate the inner lists.\n", "\n", "Note that slicing (usually) creates a copy, too (careful with numpy arrays, though), which is why in quite a bit of source code you see slices when a copy is desired:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = range(4)\n", "b = a[:]\n", "id(a), id(b)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As mentioned above, some *methods* modify object **in-place**:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = [1,2,3]\n", "a.append(5) # modifies ``a``" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and some will return a copy rather than modifying the object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s = 'hello'\n", "s.upper() # returns a copy of the string in uppercase without modifying s" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It should be clear from the documentation (e.g. ``s.upper?``) how a particular method behaves." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Mutable vs immutable objects" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Some objects are **immutable**, which means that they cannot be modified - examples include ``float``, ``int``, ``str``. For instance, when doing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 1.\n", "a = 2. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the second line, a new location in memory is created for ``2.``, and ``a`` points at that object, not at ``1.`` (in other words, the float is not being changed, it is ``a`` that is pointing to a different object)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "On the other hand, ``list``, ``dict``, and Numpy arrays are **mutable**, which means the object can be modified:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = [1,2,3]\n", "a.append(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After the second line, ``a`` still points at the same list, but the list has now been modified." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Functions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A final but important point is that when passing variables to functions, variables are passed as references, so:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def do(x):\n", " x.append(1)\n", " \n", "a = [1,2]\n", "do(a)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The following, however, just changes the value ``x`` in ``do`` references and thus has no effect outside of ``do``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def do(x):\n", " x = 0 # re-assigns x to 0, but only in the function\n", "\n", "a = [1,2]\n", "do(a)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Copying and Referencing Numpy arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "With Numpy arrays, one has to be particularly careful with the copying/referencing distinction. With a few exceptions (and superficially contrary to the behaviour of almost all other python objects), most slicing/masking operations in Numpy indicate **references**, not copies, to the data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "y = x\n", "y[3] = 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "This is similar to lists, but now consider the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "y = x[::2]\n", "print(y)\n", "y[3] = 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Even though we took a slice with a given start, end, and slice, the resulting array was still just a reference, or **view**, of the array in the original array! (note that for lists, ``x[::2]`` returns a copy!). This can be very handy when combined with masking:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "x[x < 5] = 0.\n", "x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "There is one exception to the referencing, which is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "y = x[[1,3,2,2]] # returns a new array, not a view\n", "y[0] = 9\n", "x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As before, you can explore this further to understand in what cases references or copies are made. However, be aware that the ``id`` of a view *will* be different from the original array, even though the view is actually pointing to a subset of the original array." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "In the case of Numpy arrays, one can force a copy by doing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "y = x.copy()\n", "y[0] = -1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before you start cursing the numpy authors because it might seem they were out to confuse you: They did this because very common operations become very fast in this way, and in practice that's much less of a trap than you may suspect." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following questions are just to test your understanding of the variable assignment - you don't need to write any code - just try and think of what the output will be, then you can try it out to check if you got it right:\n", "\n", "What will ``a`` be after the following?\n", "\n", " a = [1, 3., [1, 2, 3], 'hello']\n", " b = a[0]\n", " b = 4.\n", "\n", "What will ``c`` be after the following?\n", "\n", " c = [1, 3., [1, 2, 3], 'hello']\n", " d = c[2]\n", " d.append(8)\n", "\n", "What will ``e`` be after the following?\n", "\n", " e = [1, 3., [1, 2, 3], 'hello']\n", " f = e[2]\n", " f = [1, 2]\n", "\n", "What will ``g`` be after the following?\n", "\n", " g = [1, 2, 3, 4]\n", " h = g[::2]\n", " h[0] = 9\n", "\n", "What will ``i`` be after the following?\n", "\n", " import numpy as np\n", " i = np.array([1, 2, 3, 4])\n", " j = i[::2]\n", " j[0] = 9" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You can try here to see if your guess is correct!\n" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 1 }